Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] integrate zeknox GPU-acceleration library into gnark #1332

Open
wants to merge 65 commits into
base: master
Choose a base branch
from

Conversation

dloghin
Copy link

@dloghin dloghin commented Nov 28, 2024

Description

This PR aims to integrate zeknox GPU-acceleration library into gnark. Specifically, this PR targets the GPU (NVIDIA CUDA) acceleration of groth16 backend over BN254. In addition, this PR adds a new example consisting of proving/verifying a batch of secp256r1 (P256) signatures. Our benchmarking shows 1.54-1.57X speedup of the CPU+GPU execution (with zeknox) compared to the default CPU-only execution.

In summary, we did the following addition:

  • accelerated groth16 over BN254 with zeknox under backend/groth16/bn254/zeknox folder.
  • timing in backend/groth16/bn254/prove.go printed in debug mode.
  • a code example of proving/verifying a batch of secp256r1 (P256) signatures under examples/p256.
  • instructions in README.md on how to run gnark with zeknox.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this been tested?

We wrote new tests under backend/groth16/bn254/zeknox and examples/p256. In addition, we also run tests under backend/groth16/bn254.

  • Test A: backend/groth16/bn254
cd backend/groth16/bn254
go test
go test -tags zeknox
  • Test B: backend/groth16/bn254/zeknox
cd backend/groth16/bn254/zeknox
go test
go test -tags zeknox
  • Test C: examples/p256
cd examples/p256
go test
go test -tags zeknox
  • Test D: examples/mimc
cd examples/mimc
go test
go test -tags zeknox

How has this been benchmarked?

  • Benchmark A

We ran the P256 example to prove/verify a batch of 10 secp256r1 keys. The steps to run:

cd examples
go build -tags zeknox
./examples
  • Platform A: on Google Cloud Platform g2-standard-32 instance with 32 vCPU (cores) of Intel Xeon type, one NVIDIA L4 GPU, and 128 GB RAM.
  • Platform B: on a x86-64 AMD Ryzen 9 5950X CPU with 16 cores (32 threads), one NVIDIA RTX 4080 GPU, and 128 GB RAM.

Results

The times below represent the proving time (in milliseconds) for 10 secp256r1 keys.

Platform CPU-only CPU+GPU (zeknox) Speedup
Platform A 5840.96 ms 3792.48 ms 1.54 X
Platform B 4066.95 ms 2588.51 ms 1.57 X

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I did not modify files generated from templates
  • golangci-lint does not output errors locally
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@dloghin dloghin marked this pull request as ready for review November 29, 2024 03:42
@doutv
Copy link
Contributor

doutv commented Dec 9, 2024

@ivokub need your help and review!

@ivokub
Copy link
Collaborator

ivokub commented Dec 9, 2024

@ivokub need your help and review!

On it. Would it be possible to allow adding commits directly to the branch for easier review?

@ivokub ivokub self-requested a review December 9, 2024 23:57
@ivokub ivokub added new feature consolidate strengthen an existing feature labels Dec 9, 2024
@ivokub ivokub added this to the v0.12.0 milestone Dec 9, 2024
@doutv
Copy link
Contributor

doutv commented Dec 10, 2024

@ivokub need your help and review!

On it. Would it be possible to allow adding commits directly to the branch for easier review?

Sure, I've add grant you push permission in https://github.com/okx/gnark/invitations

Let me delete those examples to keep the PR clean

@ivokub
Copy link
Collaborator

ivokub commented Dec 13, 2024

I'm not able to create a proof for now, in the debug logs I see the last action is:

�[90m14:05:05�[0m DBG Bs.MultiExp done �[36mMSMG2 5 took=�[0m0.86421 �[36macceleration=�[0mzeknox �[36mbackend=�[0mgroth16 �[36mcurve=�[0mbn254 �[36mnbConstraints=�[0m6

I guess it is probably some deadlock somewhere. Have you been able to run end-to-end prover?

@dloghin
Copy link
Author

dloghin commented Dec 16, 2024

Hi Ivo,

May I check: if you use the precompiled zeknox libraries, does your GPU have compute capability 8.6 or 8.9? (only these two are supported by our precompiled libraries).

On our systems, the end-to-end example (go run -tags=zeknox examples/zeknox/main.go) is working.

@ivokub
Copy link
Collaborator

ivokub commented Dec 16, 2024

Hi Ivo,

May I check: if you use the precompiled zeknox libraries, does your GPU have compute capability 8.6 or 8.9? (only these two are supported by our precompiled libraries).

On our systems, the end-to-end example (go run -tags=zeknox examples/zeknox/main.go) is working.

I'm using AWS g4dn.xlarge instance which by documentation is T4. And it seems it is compute capability 7.5.

Should it work if I compile the libraries myself? I started compiling them, but it took quite a bit of time and I didn't let it terminate. When I benchmarked previously, then g4dn was quite good balance between performance and $-per-proof cost.

@doutv
Copy link
Contributor

doutv commented Dec 16, 2024

Yeah, compile by yourself should work. Compile BN254 MSM G2 takes ~5mins on our device. expect a long compile time

@doutv
Copy link
Contributor

doutv commented Dec 16, 2024

Use this script
https://github.com/okx/zeknox/blob/main/native/build-release-msm-bn254.sh

@ivokub
Copy link
Collaborator

ivokub commented Dec 16, 2024

Use this script https://github.com/okx/zeknox/blob/main/native/build-release-msm-bn254.sh

Indeed I got it working and the speedup is similar to the one claimed in the PR (1.6x). I also had to build libblst.

But now it seems that there is an issue with the proof, I get invalid proof:

panic: points in the proof are not in the correct subgroup

I could try looking into it, but it would probably take a bit time to compare the computed values against CPU execution - would it be possible to try out with another GPU and see if you hit the same problem?

@doutv
Copy link
Contributor

doutv commented Dec 17, 2024

This is an edge case. We found this bug, tried many methods to fix it, but it still happens...
I will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consolidate strengthen an existing feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants